feedback training
19ca14e7ea6328a42e0eb13d585e4c22-AuthorFeedback.pdf
While harmonic mean score is increased from 52.2% to 62.2% on AwA1, there are less drastic changes between (a) and (b). Reviewer#2 - 1) Notation As recommended, we will reexamine the notations and try to modify them to simple8 forms. Therefore, in our opinion, unknown attributes would be the better assumption for15 zero-shot problems butitisstillworthstudying withknownattributessimilar toother worksmentioned above.(a-e)16
Reviewer 1
We hope it will be better supplements than recommended, then we will include this figure. Figure 1: Structure visualization of learned dataset A wA1,2. While harmonic mean score is increased from 52.2% to 62.2% on A wA1, there are less drastic changes between (a) and (b). To generate missing datapoints by implementing Eqn. We will add this additional explanation in our paper.
Distributed Gossip-GAN for Low-overhead CSI Feedback Training in FDD mMIMO-OFDM Systems
Cao, Yuwen, Liu, Guijun, Ohtsuki, Tomoaki, Yang, Howard H., Quek, Tony Q. S.
The deep autoencoder (DAE) framework has turned out to be efficient in reducing the channel state information (CSI) feedback overhead in massive multiple-input multipleoutput (mMIMO) systems. However, these DAE approaches presented in prior works rely heavily on large-scale data collected through the base station (BS) for model training, thus rendering excessive bandwidth usage and data privacy issues, particularly for mMIMO systems. When considering users' mobility and encountering new channel environments, the existing CSI feedback models may often need to be retrained. Returning back to previous environments, however, will make these models perform poorly and face the risk of catastrophic forgetting. To solve the above challenging problems, we propose a novel gossiping generative adversarial network (Gossip-GAN)-aided CSI feedback training framework. Notably, Gossip-GAN enables the CSI feedback training with low-overhead while preserving users' privacy. Specially, each user collects a small amount of data to train a GAN model. Meanwhile, a fully distributed gossip-learning strategy is exploited to avoid model overfitting, and to accelerate the model training as well. Simulation results demonstrate that Gossip-GAN can i) achieve a similar CSI feedback accuracy as centralized training with real-world datasets, ii) address catastrophic forgetting challenges in mobile scenarios, and iii) greatly reduce the uplink bandwidth usage. Besides, our results show that the proposed approach possesses an inherent robustness.
Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model
He, Zhiwei, Wang, Xing, Jiao, Wenxiang, Zhang, Zhuosheng, Wang, Rui, Shi, Shuming, Tu, Zhaopeng
Insufficient modeling of human preferences within the reward model is a major obstacle for leveraging human feedback to improve translation quality. Fortunately, quality estimation (QE), which predicts the quality of a given translation without reference, has achieved impressive alignment with human evaluations in the last two years. In this work, we investigate the potential of employing the QE model as the reward model (the QE-based reward model) to predict human preferences for feedback training. We first identify the overoptimization problem during QE-based feedback training, manifested as an increase in reward while translation quality declines. We examine the problem and argue that the vulnerability of the QE model might lead to high rewards for incorrect translations, resulting in overoptimization and error propagation. To address the problem, we adopt a simple yet effective method that uses heuristic rules to detect the incorrect translations and assigns a penalty term to the QE-based rewards for the detected incorrect translations. Experimental results show that the proposed QE-based feedback training achieves consistent and significant improvements across various settings, further verified through human preference studies. Our subsequent analysis demonstrates the high data efficiency of the proposed QE-based feedback training: the proposed approach using a small amount of monolingual data can outperform systems using larger parallel corpora.